This task involves the analysis of crime and climate data for Colchester in 2024–25 to determine any correlation between climatic patterns and street-level crime occurrences. The crime2024-25.csv data set provides details on individual crime occurrences by type, location, and date to provide a complete picture of criminal activity. Aiding this, two climate data temp2024-25.csv and temp2023-24.csv—contain daily weather information from a local station, which gives details on temperature, humidity, precipitation, and other meteorological parameters. With the integration of crime and weather information, this analysis seeks to determine if weather factors have a correlation with variations in the frequency or types of crime. These findings may be beneficial in informing the local law enforcement patterns and resource deployment. Also, a comparison of climatic data of 2024–25 with the last year will allow one to identify any major climatic anomalies that may have dictated the patterns of crime. This cross disciplinary integration of criminology with environmental data science works for applied public safety research.
The crime2024-25.csv dataset contains detailed street-level crime information for Colchester covering the April 2024 to March 2025 timeframe, with 6,047 records. Each record is a unique crime incident, as coded by an ID and possibly a persistent ID. Key variables include the nature of the crime (e.g., anti-social behaviour, violent crime), date, geographical coordinates (latitude and longitude), and street-level location at which the incident took place. There are also columns that contain contextual data such as location type, road name, and outcome status (e.g., “Under investigation” or “Unable to prosecute suspect”). Some entries have missing data on fields such as persistent ID and outcome. The information comes from UK Police data and is intended to facilitate spatial and temporal examination of crime trends for evaluating patterns and responses to local police activity.
Preprocessing of data guarantees that the crime dataset is clean and ready for analysis. The str(df) command checks initially the data frame structure to confirm data types. colSums(is.na(df)) identifies missing values across all columns. The date column, previously in string format, is now read to a proper date format using ym() so accurate time-based analysis is possible. Categorical variables—category, location_type, and outcome_status—are converted to factors so as to facilitate statistical modeling and plotting.
For missing values in outcome_status, the factor is first converted to character type in order to allow for safe replacement. This is then succeeded by the replace_na() function, where missing values are replaced by “Unknown” for uniformity and to avoid analysis complications. All these combined prepare the dataset for the next phase of tasks such as summarization, visualization, and modeling, while preserving data integrity and making the dataset compatible with R data-handling functions.
## 'data.frame': 6047 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ category : chr "anti-social-behaviour" "anti-social-behaviour" "anti-social-behaviour" "anti-social-behaviour" ...
## $ persistent_id : chr "" "" "" "" ...
## $ date : chr "2024-04" "2024-04" "2024-04" "2024-04" ...
## $ lat : num 51.9 51.9 51.9 51.9 51.9 ...
## $ long : num 0.896 0.904 0.895 0.921 0.898 ...
## $ street_id : int 2153038 2153245 2153000 2153730 2153077 2153077 2153426 2153593 2153012 2153237 ...
## $ street_name : chr "On or near North Hill" "On or near Bus/coach Station" "On or near Church Street" "On or near Tarrett Drive" ...
## $ context : logi NA NA NA NA NA NA ...
## $ id : int 118021898 118022736 118022480 118022387 118022363 118022329 118022316 118022288 118022276 118022270 ...
## $ location_type : chr "Force" "Force" "Force" "Force" ...
## $ location_subtype: chr "" "" "" "" ...
## $ outcome_status : chr NA NA NA NA ...
## X category persistent_id date
## 0 0 0 0
## lat long street_id street_name
## 0 0 0 0
## context id location_type location_subtype
## 6047 0 0 0
## outcome_status
## 668
| Crime Category | Frequency |
|---|---|
| anti-social-behaviour | 668 |
| bicycle-theft | 151 |
| burglary | 157 |
| criminal-damage-arson | 466 |
| drugs | 231 |
| other-crime | 91 |
| other-theft | 399 |
| possession-of-weapons | 58 |
| public-order | 451 |
| robbery | 81 |
| shoplifting | 643 |
| theft-from-the-person | 84 |
| vehicle-crime | 253 |
| violent-crime | 2314 |
Table 1: Crime Category Frequencies: This frequency table provides the categories and their frequencies of reported offenses by the crime category. Violent crime (2,314) is the most frequent, followed by anti-social behaviour (668) and shoplifting (643). Possession of weapons and robbery are the least frequent. It provides prominent crime types in Colchester for 2024–25.
| Outcome Status | Frequency |
|---|---|
| Action to be taken by another organisation | 114 |
| Awaiting court outcome | 314 |
| Court result unavailable | 218 |
| Formal action is not in the public interest | 51 |
| Further action is not in the public interest | 4 |
| Further investigation is not in the public interest | 2 |
| Investigation complete; no suspect identified | 2002 |
| Local resolution | 174 |
| Offender given a caution | 52 |
| Status update unavailable | 235 |
| Suspect charged as part of another case | 1 |
| Unable to prosecute suspect | 1793 |
| Under investigation | 419 |
| Unknown | 668 |
Table 2: Outcome Status Frequencies: This shows what became of each offence. Most common is “Investigation complete; no suspect identified” (2,002 cases), implying problems in identifying culprits. “Unable to prosecute suspect” (1,793) is also common. “Unknown” (668) accounts for missing data. Formal or court-based outcomes happened in only a small minority.
| LocationType | Action to be taken by another organisation | Awaiting court outcome | Court result unavailable | Formal action is not in the public interest | Further action is not in the public interest | Further investigation is not in the public interest | Investigation complete; no suspect identified | Local resolution | Offender given a caution | Status update unavailable | Suspect charged as part of another case | Unable to prosecute suspect | Under investigation | Unknown | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BTP | BTP | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 0 | 4 | 0 |
| Force | Force | 114 | 314 | 218 | 51 | 4 | 2 | 2002 | 174 | 52 | 221 | 1 | 1793 | 415 | 668 |
Table 3: Outcome by Police Force: This table divides outcomes by police force. All substantial outcomes (e.g., prosecutions, investigations, cautions) occurred under Force. BTP added only a few records, with few or no cases closed, corroborating that Force processed the lion’s share of crime and follow-up actions.
| CrimeCategory | BTP | Force | |
|---|---|---|---|
| anti-social-behaviour | anti-social-behaviour | 0 | 668 |
| bicycle-theft | bicycle-theft | 1 | 150 |
| burglary | burglary | 0 | 157 |
| criminal-damage-arson | criminal-damage-arson | 1 | 465 |
| drugs | drugs | 1 | 230 |
| other-crime | other-crime | 0 | 91 |
| other-theft | other-theft | 5 | 394 |
| possession-of-weapons | possession-of-weapons | 0 | 58 |
| public-order | public-order | 3 | 448 |
| robbery | robbery | 1 | 80 |
| shoplifting | shoplifting | 0 | 643 |
| theft-from-the-person | theft-from-the-person | 2 | 82 |
| vehicle-crime | vehicle-crime | 0 | 253 |
| violent-crime | violent-crime | 4 | 2310 |
Table 4: Crime Type by Police Force: This is a comparison of categories of BTP crime and Force. BTP handled relatively few crimes (e.g., 1 bike theft, 4 violent crimes), while Force handled most of them, including all anti-social behaviour, burglary, and shoplifting. This is the same as general street crime in Colchester being within local police jurisdiction.
##Graph 1: This graph shows monthly number of cases between May 2024 and
March 2025. Starting from a number near 550 cases in May 2024, the
number steadily decreases every month afterward. In March 2025, the
number falls to near 400 cases, which reflects a steep downward trend
for these 10 months.
##Graph 2: The graphs plotted shows the top 10 crimes categories from
the dataset. From here it can be interpreted that violent crime,
antisocial behavior and shoplifting are the 3 major crimes occoured in
the city. The violent crimes has occoured almost 2000 times where as
bicycle theft and burglary are the least occouring crimes
##Graph 3: This bar chart indicates the case outcome status distribution. The most frequent of these are “Investigation complete; no suspect identified” and “Unable to prosecute suspect,” both of which have figures of about 500,000. “Under investigation” and “Status update unavailable” are also frequent. Some of the outcomes indicate no action taken further.
## street_name n
## 1 On or near Shopping Area 498
## 2 On or near Supermarket 488
## 3 On or near Nightclub 196
## 4 On or near George Street 147
## 5 On or near Conference/exhibition Centre 138
## 6 On or near Parking Area 137
## 7 On or near Culver Street West 136
## 8 On or near Police Station 135
## 9 On or near St Nicholas Street 112
## 10 On or near Cowdray Avenue 108
##Graph 4: This pie chart ranks the top five most common crime types in
decreasing order of frequency, led by the greatest category of
anti-social behaviour. Next is criminal damage and arson, followed by
public order offence. Then violent crime and shoplifting fill out the
top five as the most reported types of crime.
##Graph 5: This histogram displays the frequency distribution of
latitude measures, from 51.875 to 51.901. The y-axis is labeled with
counts (0–750), and peaks show regions of high density. The tallest bar
(presumably around 51.890–51.900) suggests a cluster of data points,
convenient for spotting geographic clustering in datasets like crime
occurrences, weather stations, or population clusters.
##Graph 6: The violin plot depicts how the spread of the values of
latitude varies between outcome statuses. All outcomes, apart from
“Investigation complete; no suspect identified,” have a close spread
around the value of latitude 51.885, whereas it has a highly close and
greater distribution.
##Graph 7: This scatter plot illustrates the correlation between
longitude (x-axis, 0.88–0.92) and latitude (y-axis, 51.875–51.905) for
different crime types in one specific area. Every point is a crime
occurrence, color-coded by type (for example, theft, violence). The
graph assists in demonstrating crime distribution geographically, with
groups revealing areas of high crime concentration.
##Graph 8: This correlation heatmap illustrates the relationship of
latitude (lat) and longitude (long) with other variables. The
correlations vary from -1 to -0.13, indicating weak to strong negative
relationship. Stronger negative association is indicated by denser
color, hence the related variables decrease as lat/long value
increases.
##Graph 9: This box plot shows the distribution of latitudes (y-axis, 51.875–51.905) across various types of crimes (x-axis). Box plots for every crime type (e.g., drugs, theft) show median, quartiles, and outliers. Differences in latitude ranges suggest geospatial clusters of crimes—different types of crimes might be more frequent at specific locations.
##Graph 10: This script identifies the 20 most criminally-contaminated streets in Colchester by summarizing and aggregating the data set. It continues to use the leaflet package to create an interactive map on which these streets are plotted with circle markers. Marker size and color saturation have been used to indicate the number of crimes. The script includes a legend for clarity, allowing users to visually estimate areas of high crime by location, making it easier for spatial analysis of crimes.
Temperature data consist of 366 daily weather readings from a station in the Colchester area (station_ID 3590), from April 1, 2023, to March 31, 2024. Every row is a weather reading on a specific date. Key variables are mean, maximum, and minimum temperature (°C), dew point temperature (TdAvgC), mean humidity (HrAvg%), wind speed and direction, station and sea-level air pressure (PresslevHp, PreselevHp), and rainfall (Precmm). Other variables capture cloud cover (TotClOct, lowClOct), sunshine duration (SunD1h), visibility (VisKm), and snow depth (SnowDepcm). Missing values are present in a few columns, notably snow depth and context-related measures. This dataset offers correlation analysis of weather conditions with other variables—e.g., the pattern of daily crimes—on contextual insights into the environment surrounding local crimes.
| Wind Direction | No Precipitation | Precipitation | |
|---|---|---|---|
| E | E | 9 | 2 |
| ENE | ENE | 18 | 3 |
| ESE | ESE | 8 | 5 |
| N | N | 6 | 4 |
| NE | NE | 17 | 4 |
| NNE | NNE | 7 | 4 |
| NNW | NNW | 6 | 6 |
| NW | NW | 8 | 4 |
| S | S | 11 | 14 |
| SE | SE | 3 | 6 |
| SSE | SSE | 5 | 10 |
| SSW | SSW | 14 | 22 |
| SW | SW | 21 | 26 |
| W | W | 13 | 12 |
| WNW | WNW | 11 | 6 |
| WSW | WSW | 24 | 27 |
Preprocessing begins with the conversion of Date column in the weather data to proper Date format using as.Date() to ensure the accuracy of time-dependent operations. A two-way frequency table is subsequently extracted to examine the relationship between the wind direction (WindkmhDir) and the occurrence of precipitation (Precmm > 0). The > operator creates a logical statement where TRUE illustrates days of precipitation and FALSE illustrates dry days. This table is converted to a data frame, in nice form for visualization, by the kable() function for tidy tabular presentation.
The produced table shows precipitation patterns based on different wind directions. For instance, WSW (West-Southwest) registered the highest number of rainy days (27), followed by SW (Southwest) with 26 and SSW (South-Southwest) with 22. Meanwhile, easterly winds like E and ENE had fewer precipitation days. These results mean that precipitation is most probable in Colchester when wind comes from southern and western directions, offering valuable knowledge on weather-crime pattern relations.
##Graph 1: This bar chart displays the number of days by wind direction
(for example, N, S, E, W, and intermediates like ENE, SSW). Compass
directions are shown along the x-axis, and the not-visible y-axis likely
measures counts. It helps to identify prevailing winds, where taller
bars denote more frequent wind directions in the data set.
##Graph 2: The graph shows the histogram of average temperature and from
here it can be interpreted that temperature mostly lies in the range of
5 to 10 degree centigrade. Also there are some days where temperature
dropped below the 0 degree that mostly happened during winter. There are
also some days where temperature has rose above 20 degree
##Graph 3: Boxplot to illustrate comparison of mean temperatures (°C)
across wind direction (e.g., N, S, E, W). Temperature ranges (10–20°C)
are on the y-axis, and wind directions (partially shown) on the x-axis.
Boxes represent median, quartiles, and outliers and display how
temperature distributions vary by wind direction—easy to detect climate
trends or weather patterns
##Graph 4: This scatter plot investigates the relationship between
atmospheric pressure levels (x-axis, hPa) and average temperature
(y-axis, °C). Temperature range is unusually broad (5–1200°C), which may
be due to mistakes or log scaling. Pressure levels (up to 1040 hPa) may
either be surface or high-altitude measurements. The graph would likely
indicate patterns such as temperature with height or weather.
| Variable | TemperatureCAvg | TemperatureCMax | TemperatureCMin | PresslevHp | Precmm | |
|---|---|---|---|---|---|---|
| TemperatureCAvg | TemperatureCAvg | 1.00 | 0.98 | 0.95 | 0.09 | 0.03 |
| TemperatureCMax | TemperatureCMax | 0.98 | 1.00 | 0.89 | 0.12 | -0.02 |
| TemperatureCMin | TemperatureCMin | 0.95 | 0.89 | 1.00 | 0.02 | 0.09 |
| PresslevHp | PresslevHp | 0.09 | 0.12 | 0.02 | 1.00 | -0.42 |
| Precmm | Precmm | 0.03 | -0.02 | 0.09 | -0.42 | 1.00 |
This correlation matrix shows strong positive correlations between temperature observations, especially between mean and maximum temperature (r = 0.98). Precipitation (Precmm) is moderately negatively correlated with pressure (r = -0.42), suggesting rain falls on low-pressure days. Other precipitation correlations are weak or zero.
##Graph 5: This time series plot tracks average temperature (°C) from
April 2024 through April 2023, with LOESS smoothing (trend line)
highlighting seasonal patterns. Peaks are presumably summer months
(e.g., Aug 2023), and troughs represent the cold periods. The smoothed
line helps the eye visualize long-term trends, i.e., cycles of warming
or cooling, amidst day-to-day fluctuations.
##Graph 6: This interactive time series graph plots average temperature trends (°C) from April 2023 to April 2024, from 5°C to 20°C. The user can supposedly zoom in or hover for information. The graph shows seasonal movement—warmer highs (e.g., summer 2023) and colder lows (e.g., winter 2024)—enabling the analysis of climate trends dynamically. Missing data points show an incomplete rendering.
The 2024–25 climate data set contains 365 daily values from a Colchester-region weather station (station_ID 3590), starting on April 1, 2024, and ending on March 31, 2025. Each row captures a snapshot of some weather measurements on a given day. All important variables are mean, highest and lowest temperatures (°C), dew point (TdAvgC), humidity (HrAvg%), wind direction and velocity, sea-level pressure (PresslevHp), rainfall (Precmm), cloud cover (TotClOct, lowClOct), duration of sunshine (SunD1h), visibility (VisKm), and depth of snow (SnowDepcm) with some missing values present mainly in snow fields. The details help determine seasonal weather patterns, extreme events, or anomalies. It would also be useful for correlation with other data sets—e.g., crime datasets—towards exploring possible climate-related impacts on human behavior within Colchester.
The 2024–25 climate data set contains 365 daily values from a Colchester-region weather station (station_ID 3590), starting on April 1, 2024, and ending on March 31, 2025. Each row captures a snapshot of some weather measurements on a given day. All important variables are mean, highest and lowest temperatures (°C), dew point (TdAvgC), humidity (HrAvg%), wind direction and velocity, sea-level pressure (PresslevHp), rainfall (Precmm), cloud cover (TotClOct, lowClOct), duration of sunshine (SunD1h), visibility (VisKm), and depth of snow (SnowDepcm) with some missing values present mainly in snow fields. The details help determine seasonal weather patterns, extreme events, or anomalies. It would also be useful for correlation with other data sets—e.g., crime datasets—towards exploring possible climate-related impacts on human behavior within Colchester.
| Wind Direction | No Precipitation | Precipitation | |
|---|---|---|---|
| E | E | 13 | 4 |
| ENE | ENE | 6 | 5 |
| ESE | ESE | 5 | 6 |
| N | N | 6 | 8 |
| NE | NE | 11 | 8 |
| NNE | NNE | 7 | 4 |
| NNW | NNW | 2 | 15 |
| NW | NW | 10 | 11 |
| S | S | 8 | 9 |
| SE | SE | 8 | 6 |
| SSE | SSE | 10 | 11 |
| SSW | SSW | 11 | 21 |
| SW | SW | 18 | 23 |
| W | W | 18 | 16 |
| WNW | WNW | 12 | 13 |
| WSW | WSW | 22 | 15 |
The data set is first cleaned by converting the Date column to a suitable Date type using as.Date() for effective time filtering. It is then filtered for keeping only records for years 2024 and 2025 using filter(year(Date) %in% c(2024, 2025)). A two-way frequency table is also built with table(), comparing wind direction (WindkmhDir) with presence of precipitation, which is Yes if Precmm > 0 and No otherwise. The table is re-shaped into a data frame for proper display using kable().
Table Result: The table shows that precipitation is more frequent when the winds come from southern and western directions, particularly SW (23), SSW (21), and WNW (13). On the other hand, fewer days are experienced with precipitation by easterly directions (E, ENE, ESE). This indicates that there is a strong correlation between wind direction and rain in Colchester, which implies that winds coming from the southwest and west are most likely to be moisture and precipitation carriers for the year 2024–2025.
##Graph 1: This graph illustrates the top 5 most frequent wind
directions of 2024–25, the top one being SW (25.8%), followed by WSW
(22.6%), W (19.4%), SSW (18.8%), and WNW (13.4%). Data shows prevailing
winds indicated by percentages, which can be applied in weather
analysis, agriculture, or city planning. Percentages show that
southwesterly winds dominate in this area.
##Graph 2: This is a plot of 2024 and 2025 range of 2024 maximum
temperatures (°C) versus a span of 0°C to 30°C. Overlaid curves indicate
peaks in frequency, while trends in temperature are clear. Denser points
indicate common temperature ranges, which are helpful in establishing
yearly variations, peculiar weather patterns, or climatic anomalies. A
graphical trend analysis application.
##Graph 3: This violin plot plots sun duration (hours) distribution by
different wind directions (e.g., S, SW, W). This paired boxplot and
density curve plot shows median, spread, and frequency of sunlight hours
per wind direction. Suspicious labels (e.g., “MHz,” “VISV”) are probably
data errors or encoding issues. Suitable for weather-sunlight
correlation research.
##Graph 4: This time series graphs pressure values (hPa) April 2024 to
April 2025, using GAM smoothing to detect trends. Pressure ranges
980–1040 hPa, and changes are most likely due to weather patterns.
Seasonal patterns can be seen using the smoothed curve, i.e., storms
(dips) or high-pressure regions (peaks). Note: “MPa” is likely a unit
mistake (should read hPa).
##Graph 5:
This dynamic scatter plot displays 2024 and 2025 max and min temperatures (°C), with points colored by year. The x-axis is min temps (0–10°C), and the y-axis is max temps (0–30°C). Users can likely hover/click for data. Reveals correlations (e.g., larger max temps with larger min temps) and year-to-year variability of temperature extremes.
1)https://moodle.essex.ac.uk/course/view.php?id=12150
3)Healy, K. (2019). Data Visualization: A Practical Introduction. Princeton University Press.